archived/sentiment_parallel_batch/huggingface_sentiment_parallel_batch.ipynb (1,967 lines of code) (raw):

{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Distributed Training with Hugging Face Sentiment Classification and Batch Transform\n", "__Binary Classification with `Trainer` and `sst2` dataset, using Distributed Training and Batch Transform__" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook.\n", "\n", "![ This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-west-2/advanced_functionality|sentiment_parallel_batch|huggingface_sentiment_parallel_batch.ipynb)\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## Runtime\n", "\n", "This notebook takes approximately 60 minutes to run." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Contents\n", "\n", "1. [Introduction](#Introduction) \n", "2. [Development environment and permissions](#Development-environment-and-permissions)\n", " 1. [Installation](#Installation) \n", " 2. [Development environment](#Development-environment) \n", " 3. [Permissions](#Permissions)\n", "3. [Pre-processing](#Pre-processing)\n", " 1. [Download the dataset](#Download-the-dataset)\n", " 1. [Tokenize sentences](#Tokenize-sentences) \n", " 2. [Upload data to sagemaker_session_bucket](#Upload-data-to-sagemaker_session_bucket) \n", "4. [Fine-tune the model and start a SageMaker training job](#Fine-tune-the-model-and-start-a-SageMaker-training-job)\n", " 1. [Enabling Debugger in Estimator object](#Enabling-Debugger-in-Estimator-object)\n", " 1. [Create an Estimator and start a training job](#Create-an-Estimator-and-start-a-training-job) \n", "5. [Run Batch Transform after training a model](#Run-Batch-Transform-after-training-a-model)\n", " 1. [Generate dummy data](#Generate-dummy-data)\n", " 2. [Run the Batch Transform job](#Run-the-Batch-Transform-job)\n", " 3. [Compare the dummy data to the predicted sentiments](#Compare-the-dummy-data-to-the-predicted-sentiments)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "\n", "This notebook walks you through an end-to-end binary text classification example, whilst using [SageMaker Distributed Training](https://docs.aws.amazon.com/sagemaker/latest/dg/distributed-training.html) for the training step, and [SageMaker Batch Transform](https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform.html) for inference. If you are looking for an example notebook that trains on a single instances and uses a real-time model endpoint, please visit [Hugging Face Sentiment Classification](https://github.com/aws/amazon-sagemaker-examples/blob/b07198857bbfcf5b85c8a59357e12c49ccb91f5c/introduction_to_applying_machine_learning/huggingface_sentiment_classification/huggingface_sentiment.ipynb).\n", "\n", "This notebook uses Hugging Face's `transformers` library with a custom Amazon sagemaker-sdk extension to fine-tune using data parallelism and a pre-trained transformer on binary text classification. The pre-trained model is fine-tuned using the `sst2` dataset. The notebook then runs batch inference on generated dummy data, before analyzing the results. To get started, we need to set up the environment with a few prerequisite steps for permissions, configurations, and a few others.\n", "\n", "This notebook is adapted from two of Hugging Face's notebooks: [HuggingFace Sagemaker-sdk - Getting Started Demo](https://github.com/huggingface/notebooks/blob/main/sagemaker/01_getting_started_pytorch/sagemaker-notebook.ipynb) and [HuggingFace Sagemaker-sdk - training with custom metrics](https://github.com/huggingface/notebooks/blob/main/sagemaker/06_sagemaker_metrics/sagemaker-notebook.ipynb). These are provided here courtesy of Hugging Face.\n", "\n", "<i>NOTE: You can run this notebook in SageMaker Studio, a SageMaker notebook instance, or your local machine. This notebook was tested in a notebook instance using the `conda\\_pytorch\\_p39`\n", " kernel.</i>\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Development environment and permissions " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Installation\n", "\n", "_*Note:* We install the required libraries from Hugging Face and AWS. You also need PyTorch, if you haven't installed it already._" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "!pip install \"sagemaker==2.187.0\" \"transformers==4.33.2\" \"datasets==2.14.5\" \"s3fs==2023.6.0\" \"awscli==1.29.17\" \"accelerate==0.23.0\" \"ipywidgets==7.1.1\" \"smdebug==1.0.34\" \"seaborn==0.13.0\" --upgrade" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Development environment " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "import sagemaker.huggingface" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Permissions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_If you are going to use SageMaker in a local environment, you need access to an IAM Role with the required permissions for SageMaker. You can read more at [SageMaker Roles](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html)._" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import sagemaker\n", "\n", "sess = sagemaker.Session()\n", "# The SageMaker session bucket is used for uploading data, models and logs\n", "# SageMaker will automatically create this bucket if it doesn't exist\n", "sagemaker_session_bucket = None\n", "if sagemaker_session_bucket is None and sess is not None:\n", " # Set to default bucket if a bucket name is not given\n", " sagemaker_session_bucket = sess.default_bucket()\n", "\n", "role = sagemaker.get_execution_role()\n", "sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)\n", "\n", "print(f\"Role arn: {role}\")\n", "print(f\"Bucket: {sess.default_bucket()}\")\n", "print(f\"Region: {sess.boto_region_name}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Pre-processing\n", "\n", "We use the `datasets` library to pre-process the `sst2` dataset (Stanford Sentiment `Treebank`). After pre-processing, the dataset is uploaded to the `sagemaker_session_bucket` for use within the training job. The [sst2](https://nlp.stanford.edu/sentiment/index.html) dataset consists of 67349 training samples and _ testing samples of highly polar movie reviews." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Download the dataset" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import string\n", "import random\n", "from datasets import Dataset\n", "from transformers import AutoTokenizer\n", "import pandas as pd\n", "\n", "# Tokenizer used in pre-processing\n", "tokenizer_name = \"distilbert-base-uncased\"\n", "\n", "# S3 key prefix for the data\n", "random_string = \"\".join(random.choices(string.ascii_lowercase + string.digits, k=12))\n", "s3_prefix = \"DEMO-sentiment-analyses-\" + random_string + \"/datasets/sst2\"\n", "\n", "# Download the SST2 data from s3\n", "!curl https://sagemaker-example-files-prod-us-east-1.s3.amazonaws.com/datasets/text/SST2/sst2.test > ./sst2.test\n", "!curl https://sagemaker-example-files-prod-us-east-1.s3.amazonaws.com/datasets/text/SST2/sst2.train > ./sst2.train\n", "!curl https://sagemaker-example-files-prod-us-east-1.s3.amazonaws.com/datasets/text/SST2/sst2.val > ./sst2.val" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Tokenize sentences" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Download tokenizer\n", "tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)\n", "\n", "\n", "# Tokenizer helper function\n", "def tokenize(batch):\n", " return tokenizer(batch[\"text\"], padding=\"max_length\", truncation=True)\n", "\n", "\n", "# Load dataset\n", "test_df = pd.read_csv(\"sst2.test\", sep=\"delimiter\", header=None, engine=\"python\", names=[\"line\"])\n", "train_df = pd.read_csv(\"sst2.train\", sep=\"delimiter\", header=None, engine=\"python\", names=[\"line\"])\n", "\n", "test_df[[\"label\", \"text\"]] = test_df[\"line\"].str.split(\" \", n=1, expand=True)\n", "train_df[[\"label\", \"text\"]] = train_df[\"line\"].str.split(\" \", n=1, expand=True)\n", "\n", "test_df.drop(\"line\", axis=1, inplace=True)\n", "train_df.drop(\"line\", axis=1, inplace=True)\n", "\n", "test_df[\"label\"] = pd.to_numeric(test_df[\"label\"], downcast=\"integer\")\n", "train_df[\"label\"] = pd.to_numeric(train_df[\"label\"], downcast=\"integer\")\n", "\n", "train_dataset = Dataset.from_pandas(train_df)\n", "test_dataset = Dataset.from_pandas(test_df)\n", "\n", "# Tokenize dataset\n", "train_dataset = train_dataset.map(tokenize, batched=True)\n", "test_dataset = test_dataset.map(tokenize, batched=True)\n", "\n", "# Set format for pytorch\n", "train_dataset = train_dataset.rename_column(\"label\", \"labels\")\n", "train_dataset.set_format(\"torch\", columns=[\"input_ids\", \"attention_mask\", \"labels\"])\n", "\n", "test_dataset = test_dataset.rename_column(\"label\", \"labels\")\n", "test_dataset.set_format(\"torch\", columns=[\"input_ids\", \"attention_mask\", \"labels\"])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "train_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Upload data to `sagemaker_session_bucket`\n", "\n", "After processing the `datasets`, we upload the dataset to S3." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from s3fs import S3FileSystem\n", "\n", "s3 = S3FileSystem(storage_options=\"s3\")\n", "\n", "# save train_dataset to s3\n", "training_input_path = f\"s3://{sess.default_bucket()}/{s3_prefix}/train/input\"\n", "train_dataset.save_to_disk(training_input_path)\n", "\n", "# save test_dataset to s3\n", "test_input_path = f\"s3://{sess.default_bucket()}/{s3_prefix}/test/input\"\n", "test_dataset.save_to_disk(test_input_path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fine-tune the model and start a SageMaker training job\n", "\n", "In order to create a SageMaker training job, we need a `HuggingFace` Estimator. The Estimator handles end-to-end Amazon SageMaker training and deployment tasks. In an Estimator, we define which fine-tuning script should be used as `entry_point`, which `instance_type` should be used, which `hyperparameters` are passed in, as well as the following:\n", "\n", "\n", "\n", "```python\n", "hf_estimator = HuggingFace(entry_point=\"train.py\",\n", " source_dir=\"./scripts\",\n", " base_job_name=\"huggingface-sdk-extension\",\n", " instance_type=\"ml.p3.2xlarge\",\n", " instance_count=1,\n", " transformers_version=\"4.4\",\n", " pytorch_version=\"1.6\",\n", " py_version=\"py36\",\n", " role=role,\n", " hyperparameters = {\"epochs\": 1,\n", " \"train_batch_size\": 32,\n", " \"model_name\":\"distilbert-base-uncased\"\n", " })\n", "```\n", "\n", "When we create a SageMaker training job, SageMaker takes care of starting and managing all the required EC2 instances for us with the `huggingface` container, uploads the provided fine-tuning script `train.py`, and downloads the data from the `sagemaker_session_bucket` into the container at `/opt/ml/input/data`. Then, it starts the training job by running:\n", "\n", "```python\n", "/opt/conda/bin/python train.py --epochs 1 --model_name distilbert-base-uncased --train_batch_size 32\n", "```\n", "\n", "The `hyperparameters` defined in the `HuggingFace` estimator are passed in as named arguments. \n", "\n", "SageMaker provides useful properties about the training environment through various environment variables, including the following:\n", "\n", "* `SM_MODEL_DIR`: A string representing the path where the training job writes the model artifacts to. After training, artifacts in this directory are uploaded to S3 for model hosting.\n", "\n", "* `SM_NUM_GPUS`: An integer representing the number of GPUs available to the host.\n", "\n", "* `SM_CHANNEL_XXXX:` A string representing the path to the directory that contains the input data for the specified channel. For example, if you specify two input channels in the Hugging Face estimator's `fit()` call, named `train` and `test`, the environment variables `SM_CHANNEL_TRAIN` and `SM_CHANNEL_TEST` are set.\n", "\n", "\n", "To run the training job locally, you can define `instance_type=\"local\"` or `instance_type=\"local_gpu\"` for GPU usage.\n", "\n", "_Note: local mode is not supported in SageMaker Studio._\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pygmentize ./scripts/train.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Enabling Debugger in Estimator object\n", "\n", "\n", "#### DebuggerHookConfig\n", "\n", "Enabling Amazon SageMaker Debugger in training job can be accomplished by adding its configuration into Estimator object constructor:\n", "\n", "```python\n", "from sagemaker.debugger import DebuggerHookConfig, CollectionConfig\n", "\n", "estimator = Estimator(\n", " ...,\n", " debugger_hook_config = DebuggerHookConfig(\n", " s3_output_path=\"s3://{bucket_name}/{location_in_bucket}\", # Required\n", " collection_configs=[\n", " CollectionConfig(\n", " name=\"weights\",\n", " parameters={\n", " \"save_interval\": \"10\"\n", " }\n", " )\n", " ]\n", " )\n", ")\n", "```\n", "Here, the `DebuggerHookConfig` object instructs `Estimator` what data we are interested in.\n", "Two parameters are provided in the example:\n", "\n", "- `s3_output_path`: it points to S3 bucket/path where we intend to store our debugging tensors.\n", " Amount of data saved depends on multiple factors, major ones are: training job / data set / model / frequency of saving tensors.\n", " This bucket should be in your AWS account, and you should have full access control over it.\n", " **Important Note**: this s3 bucket should be originally created in the same region where your training job will be running, otherwise you might run into problems with cross region access.\n", "\n", "- `collection_configs`: it enumerates named collections of tensors we want to save.\n", " Collections are a convenient way to organize relevant tensors under same umbrella to make it easy to navigate them during analysis.\n", " In this particular example, you are instructing Amazon SageMaker Debugger that you are interested in a single collection named `metrics`.\n", " We also instructed Amazon SageMaker Debugger to save metrics every 10 iteration.\n", " See [Collection](https://github.com/awslabs/sagemaker-debugger/blob/master/docs/api.md#collection) documentation for all parameters that are supported by Collections and `DebuggerConfig` documentation for more details about all parameters `DebuggerConfig` supports.\n", " \n", "#### Training Script\n", " \n", "You may have noticed that the training script has been adapted to work with SageMaker Debugger. We have done so here, but note that this not is required for certain versions of PyTorch and TensorFlow. See [Supported Frameworks and Algorithms](https://docs.aws.amazon.com/sagemaker/latest/dg/debugger-supported-frameworks.html) for more information. \n", "\n", "In the above training script we have defined a class for Debugger. This allows us to pass a Debugger hook in the Trainer as part of the training process. To do this, we are using the [`SMDebug` client library](https://docs.aws.amazon.com/sagemaker/latest/dg/debugger-bring-your-own-container.html). For more information, please see the associated page for [PyTorch](https://github.com/awslabs/sagemaker-debugger/blob/master/docs/pytorch.md). Further instructions can also be found in [Adapt Your PyTorch Training Script](https://docs.aws.amazon.com/sagemaker/latest/dg/debugger-modify-script-pytorch.html).\n", " \n", "#### Rules\n", "\n", "Enabling Rules in training job can be accomplished by adding the `rules` configuration into the Estimator object constructor.\n", "\n", "- `rules`: This parameter will accept a list of rules you wish to evaluate against the tensors output by this training job.\n", " For rules, Amazon SageMaker Debugger supports two types:\n", " - SageMaker Rules: These are rules specially curated by the data science and engineering teams in Amazon SageMaker which you can opt to evaluate against your training job.\n", " - Custom Rules: You can optionally choose to write your own rule as a Python source file and have it evaluated against your training job.\n", " To provide Amazon SageMaker Debugger to evaluate this rule, you would have to provide the S3 location of the rule source and the evaluator image.\n", "\n", "In this example, you will use an Amazon SageMaker's LossNotDecreasing rule, which helps you identify if you are running into a situation where the training loss is not going down.\n", "\n", "```python\n", "from sagemaker.debugger import rule_configs, Rule\n", "\n", "estimator = Estimator(\n", " ...,\n", " rules=[\n", " Rule.sagemaker(\n", " rule_configs.loss_not_decreasing(),\n", " rule_parameters={\n", " \"collection_names\": \"losses\",\n", " \"num_steps\": \"10\",\n", " },\n", " ),\n", " ],\n", ")\n", "```\n", "\n", "- `rule_parameters`: In this parameter, you provide the runtime values of the parameter in your constructor.\n", " You can still choose to pass in other values which may be necessary for your rule to be evaluated.\n", " In this example, you will use Amazon SageMaker's LossNotDecreasing rule to monitor the `metircs` collection.\n", " The rule will alert you if the tensors in `metrics` has not decreased for more than 10 steps." ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "### Use SageMaker Distributed Training Data Parallelism\n", "\n", "[Amazon SageMaker's distributed library](https://docs.aws.amazon.com/sagemaker/latest/dg/distributed-training.html) supports various distributed training options for deep learning tasks such as computer vision (CV) and natural language processing (NLP). With SageMaker’s distributed training libraries, you can run highly scalable and cost-effective custom data parallel and model parallel deep learning training jobs. Here, we will be using the distributed library for NLP [data parallelism](https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel.html). The SageMaker data parallelism library extends SageMaker training capabilities on deep learning models with near-linear scaling efficiency, achieving fast time-to-train with minimal code changes. It is available through the AWS deep learning containers for the TensorFlow, PyTorch, and HuggingFace frameworks within the SageMaker training platform.\n", "\n", "As per the [Hugging Face documentation](https://huggingface.co/docs/sagemaker/train#data-parallelism), The Hugging Face [Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) supports SageMaker’s data parallelism library. If your training script uses the Trainer API, you only need to define the distribution parameter in the Hugging Face Estimator:\n", "\n", "```python\n", "distribution = {'smdistributed':{'dataparallel':{ 'enabled': True }}}\n", "```\n", "\n", "Please see further information in [Launch a Training job](https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel-use-api.html).\n", "\n", "**Instance types**\n", "\n", "`SMDataParallel` supports larger compute instances that have 8 GPUs per node:\n", "1. ml.p3.16xlarge\n", "1. ml.p3dn.24xlarge\n", "1. ml.p4d.24xlarge\n", "1. ml.p4de.24xlarge\n", "\n", "For specs of the instance types, see the Accelerated Computing section in the [Amazon EC2 Instance Types page](https://aws.amazon.com/ec2/instance-types/). For information about instance pricing, see [Amazon SageMaker Pricing](https://aws.amazon.com/sagemaker/pricing/). In this example, we will be using a ml.p3.16xlarge.\n", "\n", "Note, if when training you encounter a `ResourceLimitExceeded` error message, it is possible that you may need to [request a service quota increase for SageMaker resources](https://docs.aws.amazon.com/sagemaker/latest/dg/regions-quotas.html#service-limit-increase-request-procedure).\n", "\n", "**Instance count**\n", "\n", "To get the best performance and the most out of `SMDataParallel`, you should use at least 2 instances, but you can also use 1 for testing this example." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create an Estimator and start a training job" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we'll create the estimator. For this we will include:\n", "1. The IAM role to use\n", "1. The Training instance type and count\n", "1. The `HuggingFace` algorithm container and script\n", "1. The hyperparameters which are passed to the training job\n", "1. The Distribution strategy that allows us to perform data parallelism\n", "1. The `DebuggerHookConfig` object which saves the specific tensors for debugging\n", "1. The `LossNotDecreasing` rule which detects when the loss is not decreasing in value at an adequate rate\n", "\n", "And then we set the algorithm hyperparameters, as well as specify the `.fit()` function which specifies the S3 location for output data. In this case we have both a training and validation set which are passed in. Note that in the `DebuggerHookConfig` we are specifying weights, biases, and gradients. A fourth collection, losses, is also saved by default when using SageMaker Debugger with PyTorch.\n", "\n", "We also specify and pass to the Estimator metric definitions, with associated regex patterns. This will be used to parse the job logs and extract metrics, which will be used for visualization after the job completes." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sagemaker.huggingface import HuggingFace\n", "from sagemaker.pytorch import PyTorch\n", "\n", "# Hyperparameters which are passed into the training job\n", "hyperparameters = {\n", " \"epochs\": 2,\n", " \"train_batch_size\": 32,\n", " \"model_name\": \"distilbert-base-uncased\",\n", "}" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "metric_definitions = [\n", " {\"Name\": \"loss\", \"Regex\": \"'loss': ([0-9]+(.|e\\-)[0-9]+),?\"},\n", " {\"Name\": \"learning_rate\", \"Regex\": \"'learning_rate': ([0-9]+(.|e\\-)[0-9]+),?\"},\n", " {\"Name\": \"train_loss\", \"Regex\": \"'train_loss': ([0-9\\\\.]+)\"},\n", " {\"Name\": \"eval_loss\", \"Regex\": \"'eval_loss': ([0-9]+(.|e\\-)[0-9]+),?\"},\n", " {\"Name\": \"eval_accuracy\", \"Regex\": \"'eval_accuracy': ([0-9]+(.|e\\-)[0-9]+),?\"},\n", " {\"Name\": \"eval_f1\", \"Regex\": \"'eval_f1': ([0-9]+(.|e\\-)[0-9]+),?\"},\n", " {\"Name\": \"eval_precision\", \"Regex\": \"'eval_precision': ([0-9]+(.|e\\-)[0-9]+),?\"},\n", " {\"Name\": \"eval_recall\", \"Regex\": \"'eval_recall': ([0-9]+(.|e\\-)[0-9]+),?\"},\n", " {\"Name\": \"eval_runtime\", \"Regex\": \"'eval_runtime': ([0-9]+(.|e\\-)[0-9]+),?\"},\n", " {\n", " \"Name\": \"eval_samples_per_second\",\n", " \"Regex\": \"'eval_samples_per_second': ([0-9]+(.|e\\-)[0-9]+),?\",\n", " },\n", " {\"Name\": \"epoch\", \"Regex\": \"'epoch': ([0-9]+(.|e\\-)[0-9]+),?\"},\n", "]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sagemaker.debugger import rule_configs, Rule, DebuggerHookConfig, CollectionConfig\n", "from sagemaker.estimator import Estimator\n", "\n", "save_interval = 5\n", "\n", "hf_estimator = HuggingFace(\n", " entry_point=\"train.py\",\n", " source_dir=\"./scripts\",\n", " instance_type=\"ml.p3.16xlarge\",\n", " processes_per_host=8,\n", " instance_count=2,\n", " role=role,\n", " transformers_version=\"4.28.1\",\n", " pytorch_version=\"2.0.0\",\n", " datasets_version=\"2.14.5\",\n", " volume_size=50,\n", " py_version=\"py310\",\n", " hyperparameters=hyperparameters,\n", " metric_definitions=metric_definitions,\n", " s3_output_path=training_input_path + \"/files\",\n", " enable_sagemaker_metrics=True,\n", " distribution={\n", " \"smdistributed\": {\"dataparallel\": {\"enabled\": True}}\n", " }, # Training using SMDataParallel Distributed Training Framework\n", " debugger_hook_config=DebuggerHookConfig(\n", " s3_output_path=f\"s3://{sess.default_bucket()}/{s3_prefix}/train/debug\", # Required\n", " collection_configs=[\n", " CollectionConfig(name=\"weights\", parameters={\"save_interval\": str(save_interval)}),\n", " CollectionConfig(name=\"biases\", parameters={\"save_interval\": str(save_interval)}),\n", " CollectionConfig(name=\"gradients\", parameters={\"save_interval\": str(save_interval)}),\n", " ],\n", " ),\n", " rules=[\n", " Rule.sagemaker(\n", " rule_configs.loss_not_decreasing(),\n", " rule_parameters={\n", " \"collection_names\": \"metrics\",\n", " \"num_steps\": str(save_interval * 2),\n", " },\n", " ),\n", " ],\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# Start the training job with the uploaded dataset as input\n", "hf_estimator.fit(\n", " {\"train\": training_input_path, \"test\": test_input_path},\n", " # This is a fire and forget event. By setting wait=False, you submit the job to run in the background.\n", " # Amazon SageMaker starts one training job and release control to next cells in the notebook.\n", " # Follow this notebook to see status of the training job.\n", " wait=False,\n", ")" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "#### Result" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As a result of the below command, Amazon SageMaker starts one training job and one rule job for you. The first one is the job that produces the tensors to be analyzed. The second one analyzes the tensors to check if it is not decreasing at any point during training.\n", "\n", "Check the status of the training job below. After your training job is started, Amazon SageMaker starts a rule-execution job to run the LossNotDecreasing rule.\n", "\n", "Note that the next cell blocks until the rule execution job ends. You can stop it at any point to proceed to the rest of the notebook. Once it says Rule Evaluation Status is Started, and shows the `RuleEvaluationJobArn`, you can look at the status of the rule being monitored.\n", "\n", "Note that the execution of the below cell will take around 30 minutes." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "import time\n", "from time import gmtime, strftime\n", "\n", "# Below command will give the status of training job\n", "job_name = hf_estimator.latest_training_job.name\n", "client = hf_estimator.sagemaker_session.sagemaker_client\n", "description = client.describe_training_job(TrainingJobName=job_name)\n", "rule_job_summary = hf_estimator.latest_training_job.rule_job_summary()\n", "print(\"Training job name: \" + job_name)\n", "print(description[\"TrainingJobStatus\"])\n", "\n", "if description[\"TrainingJobStatus\"] != \"Completed\" or rule_job_summary[0][\n", " \"RuleEvaluationStatus\"\n", "] not in [\n", " \"IssuesFound\",\n", " \"NoIssuesFound\",\n", " \"Failed\",\n", " \"Stopped\",\n", "]:\n", " while (\n", " (description[\"TrainingJobStatus\"] == \"InProgress\")\n", " or (rule_job_summary[0][\"RuleEvaluationStatus\"] == \"InProgress\")\n", " or (description[\"SecondaryStatus\"] not in [\"Training\", \"Completed\", \"Failed\", \"Stopped\"])\n", " ):\n", " description = client.describe_training_job(TrainingJobName=job_name)\n", " rule_job_summary = hf_estimator.latest_training_job.rule_job_summary()\n", " print(\n", " \"{}: {}, {}, {}\".format(\n", " strftime(\"%X\", gmtime()),\n", " description[\"TrainingJobStatus\"],\n", " description[\"SecondaryStatus\"],\n", " rule_job_summary[0][\"RuleEvaluationStatus\"],\n", " )\n", " )\n", " time.sleep(15)" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "### Check the status of the Rule Evaluation Job" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To get the rule evaluation job that Amazon SageMaker started for you, run the command below. The results show you the `RuleConfigurationName`, `RuleEvaluationJobArn`, `RuleEvaluationStatus`, `StatusDetails`, and `RuleEvaluationJobArn`. If the model parameters meet a rule evaluation condition, the rule execution job throws a client error with `RuleEvaluationConditionMet`.\n", "\n", "The logs of the rule evaluation job are available in the CloudWatch Log Stream `/aws/sagemaker/ProcessingJobs` with `RuleEvaluationJobArn`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "hf_estimator.latest_training_job.rule_job_summary()[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data Analysis - Manual\n", "\n", "Now that you've trained the system, you can analyze the data. Here, you focus on after-the-fact analysis.\n", "\n", "You import a basic analysis library, which defines the concept of trial, which represents a single training run." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "from smdebug.trials import create_trial\n", "\n", "path = hf_estimator.latest_job_debugger_artifacts_path()\n", "smdebug_trial = create_trial(path)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# cell 18\n", "from smdebug.trials import create_trial\n", "\n", "description = client.describe_training_job(TrainingJobName=job_name)\n", "s3_output_path = hf_estimator.latest_job_debugger_artifacts_path()\n", "\n", "# This is where we create a Trial object that allows access to saved tensors.\n", "trial = create_trial(s3_output_path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can list all the tensors that you know something about. Each one of these names is the name of a tensor. The name is a combination of the feature name, and whether it is a weight, bias, gradient, or loss value." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "trial.tensor_names()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For each tensor, ask for the steps where you have data. In this case, every five steps" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "trial.tensor(\"DistilBertForSequenceClassification_classifier.weight\").values()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These saved collections of weights, biases, gradients and losses into S3 allow us to analyze these over the course of the training job, allowing us to refine the model where necessary.\n", "\n", "Additionally, we can produce a table that produces the metrics recorded from the `metrics_definitions` dictionary we created earlier, as well as visualize them over time. Here, we visualize the loss, as well as the evaluation loss, accuracy, and precision." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "from sagemaker import TrainingJobAnalytics\n", "\n", "# Captured metrics can be accessed as a Pandas dataframe\n", "df = TrainingJobAnalytics(training_job_name=hf_estimator.latest_training_job.name).dataframe()\n", "df.head(15)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from matplotlib import pyplot as plt\n", "import seaborn as sns\n", "\n", "plt.rcParams[\"figure.figsize\"] = [20, 5]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "evals = df[df.metric_name.isin([\"eval_accuracy\", \"eval_precision\"])]\n", "losses = df[df.metric_name.isin([\"loss\", \"eval_loss\"])]\n", "\n", "ax = sns.lineplot(\n", " x=\"timestamp\", y=\"value\", data=evals, hue=\"metric_name\", palette=[\"green\", \"purple\"]\n", ")\n", "\n", "ax2 = plt.twinx()\n", "\n", "sns.lineplot(\n", " x=\"timestamp\",\n", " y=\"value\",\n", " data=losses,\n", " hue=\"metric_name\",\n", " palette=[\"red\", \"blue\"],\n", " ax=ax2,\n", ")\n", "\n", "ax.legend(bbox_to_anchor=(1, 1))\n", "ax2.legend(bbox_to_anchor=(1, 0.8))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Run Batch Transform after training a model \n", "\n", "After you've trained the model, you can use [Amazon SageMaker Batch Transform](https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform.html) to perform inferences with the model. In Batch Transform you provide your inference data as a S3 URI and SageMaker will care of downloading it, running the prediction and uploading the results afterwards to S3 again.\n", "\n", "If you've trained the model using the **HuggingFace estimator**, you can invoke `transformer()` method to create a transform job for a model based on the training job.\n", "\n", "```python\n", "batch_job = huggingface_estimator.transformer(\n", " instance_count=1,\n", " instance_type='ml.c5.2xlarge',\n", " strategy='SingleRecord')\n", "\n", "\n", "batch_job.transform(\n", " data='s3://s3-uri-to-batch-data',\n", " content_type='application/json', \n", " split_type='Line')\n", "```\n", "For more details about what can be specified here, see [API docs](https://sagemaker.readthedocs.io/en/stable/overview.html#sagemaker-batch-transform)." ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "### Generate dummy data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We generate 5000 dummy reviews and upload them to S3 as json lines files. We are creating this over two files, so we will specify two instances for our batch transform job." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# Making dummy data\n", "from sagemaker.s3 import S3Uploader, s3_path_join\n", "import random\n", "import csv\n", "import json\n", "\n", "batch_reviews_path = f\"s3://{sess.default_bucket()}/{s3_prefix}/batch_transform\"\n", "\n", "\n", "def make_dummy_data(file_id, review_count):\n", " outputs = []\n", " file_name = f\"batch_reviews{file_id}.jsonl\"\n", "\n", " # Generating 5000 dummy movie reviews\n", " for i in range(review_count):\n", " # Generating a fake review text\n", " review = \"\"\n", " review += random.choice(\n", " [\n", " \"This movie was \",\n", " \"I thought that the movie was \",\n", " \"In my opinion, this was \",\n", " \"Overall, this movie was \",\n", " ]\n", " )\n", " review += random.choice(\n", " [\n", " \"great\",\n", " \"terrible\",\n", " \"good\",\n", " \"bad\",\n", " \"excellent\",\n", " \"awful\",\n", " \"amazing\",\n", " \"horrible\",\n", " ]\n", " )\n", " review += random.choice([\".\", \"!\", \"...\", \"?\", \"!!\", \"??\"]) + \" \"\n", " review += \"This is \"\n", " review += random.choice(\n", " [\n", " \"because of the \",\n", " \"as a reslt of the \",\n", " \"in consequence of the \",\n", " \"as a direct result of the \",\n", " \"for the following reason: The \",\n", " ]\n", " )\n", " review += random.choice(\n", " [\n", " \"storyline.\",\n", " \"plot.\",\n", " \"cast.\",\n", " \"characters.\",\n", " \"movie quality.\",\n", " \"director.\",\n", " \"atmosphere.\",\n", " ]\n", " )\n", " outputs.append({\"inputs\": review})\n", "\n", " # Write the review to a jsonl file\n", " with open(file_name, \"w\", encoding=\"utf-8\") as f:\n", " for output in outputs:\n", " f.write(json.dumps(output) + \"\\n\")\n", "\n", " # uploads a given file to S3.\n", " s3_file_uri = S3Uploader.upload(file_name, batch_reviews_path)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "make_dummy_data(1, 2500)\n", "make_dummy_data(2, 2500)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Run the Batch Transform job" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now run the batch transform job, specifying the model data from the estimator, and the IAM Role, as well as the instance count, type, and output path. The `Strategy` parameter refers to the number of records to include in a mini-batch for an HTTP inference request. A record is a single unit of input data that inference can be made on. For example, a single line in a JSON lines file is a record, which is what we are using here. Setting Strategy to `SingleRecord` means that one line in the JSON lines file is used when making an HTTP invocation request to a container.\n", "\n", "Note, if when running the below two cells you encounter a `ResourceLimitExceeded` error message, it is possible that you may need to [request a service quota increase for SageMaker resources](https://docs.aws.amazon.com/sagemaker/latest/dg/regions-quotas.html#service-limit-increase-request-procedure)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "from sagemaker.huggingface.model import HuggingFaceModel\n", "\n", "huggingface_model = HuggingFaceModel(\n", " model_data=hf_estimator.model_data,\n", " role=role, # iam role with permissions to create an Endpoint\n", " transformers_version=\"4.26\", # transformers version used\n", " pytorch_version=\"1.13\", # pytorch version used\n", " py_version=\"py39\", # python version used\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# creates Transformer to run our batch job\n", "batch_job = huggingface_model.transformer(\n", " instance_count=2,\n", " instance_type=\"ml.p3.2xlarge\",\n", " output_path=batch_reviews_path, # we are using the same s3 path to save the output with the input\n", " strategy=\"SingleRecord\",\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# starts batch transform job and uses s3 data as input\n", "batch_job.transform(\n", " data=f\"{batch_reviews_path}\",\n", " content_type=\"application/json\",\n", " split_type=\"Line\",\n", " logs=False,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Compare the dummy data to the predicted sentiments" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We then print 10 sentences and compare them to the labels from the batch transform job." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "import json\n", "from sagemaker.s3 import S3Downloader\n", "from ast import literal_eval\n", "\n", "\n", "def create_results(input_file):\n", " # creating s3 uri for result file -> input file + .out\n", " input_file = input_file\n", " output_file = input_file + \".out\"\n", " output_path = s3_path_join(batch_reviews_path, output_file)\n", "\n", " # download file\n", " S3Downloader.download(output_path, \".\")\n", "\n", " temp_batch_transform_result = []\n", " with open(output_file) as f:\n", " for line in f:\n", " # converts jsonline array to normal array\n", " line = \"[\" + line.replace(\"[\", \"\").replace(\"]\", \",\") + \"]\"\n", " temp_batch_transform_result = literal_eval(line)\n", "\n", " data_temp = []\n", " with open(input_file) as f:\n", " for line in f:\n", " data_temp.append(json.loads(line))\n", "\n", " for i, line in enumerate(data_temp):\n", " temp_batch_transform_result[i][\"text\"] = line[\"inputs\"]\n", " temp_batch_transform_result[i][\"label\"] = (\n", " \"Positive\" if temp_batch_transform_result[i][\"label\"] == \"LABEL_1\" else \"Negative\"\n", " )\n", "\n", " # print results\n", " return temp_batch_transform_result\n", "\n", "\n", "batch_transform_result = create_results(\"batch_reviews1.jsonl\")\n", "batch_transform_result.extend(create_results(\"batch_reviews2.jsonl\"))\n", "\n", "print(json.dumps(batch_transform_result[:10], indent=4))" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "We can see that the model appears fairly confident in classifying most of the above examples correctly. But is this true for all the generated test data? Let's find out by converting the data into a Pandas dataframe, and seeing the most and least confident inputs for both labels." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# df_results.sort_values('score', ascending=False)\n", "df_results = pd.DataFrame(batch_transform_result)\n", "pd.set_option(\"display.max_colwidth\", None)\n", "\n", "df_results[df_results[\"label\"] == \"Positive\"].sort_values(\n", " \"score\", ascending=False\n", ").drop_duplicates()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "df_results[df_results[\"label\"] == \"Negative\"].sort_values(\n", " \"score\", ascending=False\n", ").drop_duplicates()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see in fact that the model is struggling with a few examples, particularly where the text is punctuated with question marks. \n", "Finally, we create a box plot to visualize the distribution of the predictions for each label." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "sns.boxplot(\n", " x=\"label\",\n", " y=\"score\",\n", " data=df_results,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see that the ranges and interquartile ranges for the positive label are fairly larger than for the negative label. This shows a larger variance in predictions for positive, which may need refining upon further training and hyperparameter optimization. Note though that because this test data is randomly generated, the box plot may differ in your case. One interesting exercise is to determine whether the positive label interquartile range is consistently larger than the negative label upon further generations of test data, and sending these to the model. This may help to determine whether the model is in general more consistently confident for negative label predictions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Extras" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Deploy an endpoint\n", "\n", "Uncomment the below three cells to deploy an endpoint, send a request, view the response, and then delete the endpoint.\n", "\n", "To deploy the endpoint, call `deploy()` on the HuggingFace estimator object, passing in the desired number of instances and instance type." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# predictor = hf_estimator.deploy(1, \"ml.p3.2xlarge\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then use the returned predictor object to perform inference." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# sentiment_input = {\"inputs\": \"I love using the new Inference DLC.\"}\n", "\n", "# predictor.predict(sentiment_input)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We see that the fine-tuned model classifies the test sentence \"I love using the new Inference DLC.\" as having positive sentiment with 98% probability!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, delete the endpoint." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# predictor.delete_endpoint()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Estimator Parameters" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Uncomment the below two cells to print more information about the Estimator, as well as the training logs." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# print(f\"Container image used for training job: \\n{hf_estimator.image_uri}\\n\")\n", "# print(f\"S3 URI where the trained model is located: \\n{hf_estimator.model_data}\\n\")\n", "# print(f\"Latest training job name for this estimator: \\n{hf_estimator.latest_training_job.name}\\n\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# hf_estimator.sagemaker_session.logs_for_job(hf_estimator.latest_training_job.name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Attach a previous training job to an estimator\n", "\n", "In SageMaker, you can attach a previous training job to an estimator to continue training, get results, etc." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Uncomment the following lines and supply your training job name\n", "\n", "# from sagemaker.estimator import Estimator\n", "# old_training_job_name = \"<your-training-job-name>\"\n", "# hf_estimator_loaded = Estimator.attach(old_training_job_name)\n", "# hf_estimator_loaded.model_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Notebook CI Test Results\n", "\n", "This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.\n", "\n", "\n", "![ This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-east-1/advanced_functionality|sentiment_parallel_batch|huggingface_sentiment_parallel_batch.ipynb)\n", "\n", "![ This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-east-2/advanced_functionality|sentiment_parallel_batch|huggingface_sentiment_parallel_batch.ipynb)\n", "\n", "![ This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-west-1/advanced_functionality|sentiment_parallel_batch|huggingface_sentiment_parallel_batch.ipynb)\n", "\n", "![ This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ca-central-1/advanced_functionality|sentiment_parallel_batch|huggingface_sentiment_parallel_batch.ipynb)\n", "\n", "![ This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/sa-east-1/advanced_functionality|sentiment_parallel_batch|huggingface_sentiment_parallel_batch.ipynb)\n", "\n", "![ This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-1/advanced_functionality|sentiment_parallel_batch|huggingface_sentiment_parallel_batch.ipynb)\n", "\n", "![ This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-2/advanced_functionality|sentiment_parallel_batch|huggingface_sentiment_parallel_batch.ipynb)\n", "\n", "![ This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-3/advanced_functionality|sentiment_parallel_batch|huggingface_sentiment_parallel_batch.ipynb)\n", "\n", "![ This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-central-1/advanced_functionality|sentiment_parallel_batch|huggingface_sentiment_parallel_batch.ipynb)\n", "\n", "![ This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-north-1/advanced_functionality|sentiment_parallel_batch|huggingface_sentiment_parallel_batch.ipynb)\n", "\n", "![ This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-southeast-1/advanced_functionality|sentiment_parallel_batch|huggingface_sentiment_parallel_batch.ipynb)\n", "\n", "![ This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-southeast-2/advanced_functionality|sentiment_parallel_batch|huggingface_sentiment_parallel_batch.ipynb)\n", "\n", "![ This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-northeast-1/advanced_functionality|sentiment_parallel_batch|huggingface_sentiment_parallel_batch.ipynb)\n", "\n", "![ This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-northeast-2/advanced_functionality|sentiment_parallel_batch|huggingface_sentiment_parallel_batch.ipynb)\n", "\n", "![ This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-south-1/advanced_functionality|sentiment_parallel_batch|huggingface_sentiment_parallel_batch.ipynb)\n", "\n" ] } ], "metadata": { "availableInstances": [ { "_defaultOrder": 0, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 4, "name": "ml.t3.medium", "vcpuNum": 2 }, { "_defaultOrder": 1, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.t3.large", "vcpuNum": 2 }, { "_defaultOrder": 2, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.t3.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 3, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.t3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 4, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.m5.large", "vcpuNum": 2 }, { "_defaultOrder": 5, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.m5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 6, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.m5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 7, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.m5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 8, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.m5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 9, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.m5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 10, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.m5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 11, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.m5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 12, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.m5d.large", "vcpuNum": 2 }, { "_defaultOrder": 13, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.m5d.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 14, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.m5d.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 15, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.m5d.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 16, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.m5d.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 17, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.m5d.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 18, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.m5d.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 19, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.m5d.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 20, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": true, "memoryGiB": 0, "name": "ml.geospatial.interactive", "supportedImageNames": [ "sagemaker-geospatial-v1-0" ], "vcpuNum": 0 }, { "_defaultOrder": 21, "_isFastLaunch": true, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 4, "name": "ml.c5.large", "vcpuNum": 2 }, { "_defaultOrder": 22, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.c5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 23, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.c5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 24, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.c5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 25, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 72, "name": "ml.c5.9xlarge", "vcpuNum": 36 }, { "_defaultOrder": 26, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 96, "name": "ml.c5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 27, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 144, "name": "ml.c5.18xlarge", "vcpuNum": 72 }, { "_defaultOrder": 28, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.c5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 29, "_isFastLaunch": true, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.g4dn.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 30, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.g4dn.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 31, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.g4dn.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 32, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.g4dn.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 33, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.g4dn.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 34, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.g4dn.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 35, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 61, "name": "ml.p3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 36, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 244, "name": "ml.p3.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 37, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 488, "name": "ml.p3.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 38, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.p3dn.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 39, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.r5.large", "vcpuNum": 2 }, { "_defaultOrder": 40, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.r5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 41, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.r5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 42, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.r5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 43, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.r5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 44, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.r5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 45, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 512, "name": "ml.r5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 46, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.r5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 47, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.g5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 48, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.g5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 49, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.g5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 50, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.g5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 51, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.g5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 52, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.g5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 53, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.g5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 54, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.g5.48xlarge", "vcpuNum": 192 }, { "_defaultOrder": 55, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 1152, "name": "ml.p4d.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 56, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 1152, "name": "ml.p4de.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 57, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.trn1.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 58, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 512, "name": "ml.trn1.32xlarge", "vcpuNum": 128 }, { "_defaultOrder": 59, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 512, "name": "ml.trn1n.32xlarge", "vcpuNum": 128 } ], "instance_type": "ml.t3.medium", "interpreter": { "hash": "c281c456f1b8161c8906f4af2c08ed2c40c50136979eaae69688b01f70e9f4a9" }, "kernelspec": { "display_name": "Python 3 (PyTorch 1.13 Python 3.9 CPU Optimized)", "language": "python", "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-west-2:000000000000:image/pytorch-1.13-cpu-py39" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.16" } }, "nbformat": 4, "nbformat_minor": 4 }